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Abstract 



A binary matrix satisfies the consecutive ones property (C1P) if its columns can be permuted 



q 

c/2 ' such that the Is in each row of the resulting matrix are consecutive. Equivalently, a family of 

sets F — {Qi, . . . , Qm}, where Qi C R for some universe R, satisfies the C1P if the symbols in R 
can be permuted such that the elements of each set Qi E F occur consecutively, as a contiguous 
segment of the permutation of R's symbols. We consider the C1P version on multisets and prove 
that counting its solutions is difficult (^P-complete). We prove completeness results also for 
counting the frontiers of PQ-trees, which are typically used for testing the C1P on sets, thus 
showing that a polynomial algorithm is unlikely to exist when dealing with multisets. We use a 
combinatorial approach based on parsimonious reductions from the Hamiltonian path problem, 
' showing that the decisional version of our problems is therefore ./VP-complete. 

o ' 

1 Introduction 

A binary matrix M of size m x n satisfies the consecutive ones property (C1P) if its n columns can 
be permuted such that the Is in each row of the resulting matrix are consecutive. An equivalent 



definition holds for the columns by permuting the rows. The property is often formulated in terms 
of sets: A family of sets F = {Qi, . . . , Q m }, where each Qi is a subset of the universe of symbols 
R = {r\ , . . . , r n }, satisfies the C1P if the symbols in R can be permuted such that the elements of 
each set Qi G F occur consecutively as a contiguous segment of the permutation of i?'s symbols. 

For example, consider the universe R = {a, b,c,d, e}. The C1P is not satisfied by the family 
F = {{a, b}, {b, c}, {b, d}}, since b can have at most two adjacent symbols in any permutation of 
R. On the other hand, the family F = {{b, c}, {b, d}} satisfies the C1P: one feasible permutation 
of R is x = eacbd, but not all permutations of R are feasible (e.g. y = abcde is not, because the 
symbols {b,d} are not consecutive in y). 

The C1P on sets can be formulated as a C1P problem on the binary matrix M obtained by 
associating row i with set Qi € F, and column j with element rj G R. Specifically, My = 1 iff 
r j G Qi, as shown below for our example. 

abcde eacbd 
{b,c} 01100 00110 

{b,d} 01010 00011 
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Figure 1: Some examples of PQ-trees. 



The problem of finding the orderings, namely, the permutations of R that satisfy the C1P, arises 
in several situations. It was first solved efficiently by Fulkerson and Gross [FG65] in their study on 
the incidence matrix of interval graphs, using an 0(mn 2 ) time algorithm. Ghosh [Gho72] applied 
the problem to information retrieval, where R is the set of input records and each Qi is the set of 
records satisfying an answer: for each Qi, the C1P guarantees that the corresponding records can 
be retrieved from consecutive storage locations. Booth and Leuker [Boo75, BL76] showed how to 
find any such ordering in linear time, with respect to the number of Is in M, with applications 
to some graph problems such as planarity testing. They employed the PQ-tree data structure to 
represent compactly all the orderings yielding the C1P for the given matrix M. 

The PQ-tree corresponding to our example is denoted by T\ in Figure 1. The leaves of the PQ- 
tree contain the symbols of R: when reading these symbols by traversing the leaves in preorder, we 
obtain the frontier of the PQ-tree. As it can be seen, the frontier is one of the orderings yielding the 
C1P in our example tree T±. Further orderings can be obtained by rearranging the children of the 
nodes of the PQ-tree, since they implicitly encode the sets in F. A round node in Figure 1 is called 
P-node, and its children can be rearranged in any order. A square node is called Q-node, and its 
children can be only rearranged in left-to-right or right-to-left order. By conceptually performing 
all the feasible rearrangements of the nodes in the PQ-tree according to the above rules, we obtain 
the set of frontiers for the PQ-tree. These frontiers are in one-to-one correspondence with all the 
orderings yielding the C1P for matrix M, as it can be verified by inspecting our example for T\ 
(namely, x\ = acbde, a?2 = adbce, X3 = aecbd, X4 = aedbc, X5 = cbdae, xq = dbcae, X7 = cbdea, 
xs = dbcea, xg = ecbda, xio = edbca, x\\ = eacbd, x\i = eadbc). 

1.1 Our problem 

Since its inception, the C1P has found many applications under several incarnations. Recent fields 
of application are computational biology, stringology, and bioinformatics, namely, physical map- 
ping [JM97, COR98] and gene analysis [ELP03, LPW05, Par07], providing the inspiration for the 
problems in this paper. More discussion on related work is given in Section 1.3. We consider the 
scenario for the C1P in which the symbols in the input set R are not necessarily distinct. We 
therefore investigate the problem of how to satisfy the C1P when R and the QiS are multisets. 

To get the flavor of the problem, consider the universe R = {a, b,b,c,d} and the family F = 
{{b, c}, {b, d}}. The situation arises from the fact that the symbol b in both Q\ = {b, c} and 
Q2 = {b, d} can either match the same occurrence of b in R or not. The former case gives rise to 
the PQ-tree T2 in Figure 1, while the latter gives rise to the PQ-tree T3. The set of frontiers of 
one PQ-tree is not contained in the set of frontiers of the other. However, by definition of multiset, 
the two occurrences of b in R are indistinguishable. What if we impose that all the occurrences 
of b in R must be taken simultaneously? Then, we could not deal with permutations where the 
occurrences of b are not contiguous, such as y = abcbd, even if they satisfy the C1P. Other choices 
for handling multiple occurrences of b share similar drawbacks. As we will see, a polynomial-time 
algorithm is unlikely to exist, contrarily to what happens above for sets. 
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Figure 2: Relation between the counting problems described in this paper. 

1.2 Our results 

In this paper, we show that the problems dealing with the C1P on multisets are hard. Specifically, 
we study the problem of counting the number of orderings. This is "simpler" than listing all the 
orderings. Note that the counting problem using standard PQ-trees on sets takes polynomial time, 
since we can use the aforementioned one-to-one correspondence between the orderings and the 
frontiers. 

The simple algorithm for sets is the following. For a given node u in the PQ-tree, apply a 
recursive post-order traversal: If u is a leaf, it has just one frontier. Otherwise, let d be the number 
of children of u, and fi be the number of frontiers for the ith child in u (where has been recursively 
computed for 1 < i < d). Then, the number / of frontiers for u is / = d\ x Y\i=i fi w hen u is a 
P-node, and / = 2 x Yli=i fi w hen u is a Q-node (e.g. / = 12 = 3! x 2 frontiers for T\ in Figure 1). 

Our first result is to prove that the problem (denoted #FRONT) of counting the frontiers of a 
PQ-tree whose leaves store the symbols of a multiset is ^P-complete [Val79]. We refer the reader 
to Section 3 for a proof of this result. 

Turning back to the original problem (denoted #FMO) of counting the orderings for the C1P, 
one could hope that a polynomial solution might exist without relying on PQ-trees. This is also 
unlikely to happen. Our second result is to prove that the problem of counting the orderings for 
the C1P on multisets is ^P-complete. See Section 4 for the details of the proof. 

An interesting implication of our findings is the one illustrated in Figure 2 where #HAM denotes 
the well-known counting version of the Hamiltonian path problem. We observed in Section 1.1 that 
a direct mapping of the orderings for the C1P in multisets into the frontiers of PQ-trees has some 
intrinsic ambiguity. On the other hand, we can prove that both the counting problems #FRONT and 
#FMO are ^P-complete using a reduction from #HAM. By the completeness properties, it follows 
that there exists an indirect mapping between the latter two problems but we do not know how to 
build it explicitly and directly (apart from the obvious composition). 

Our approach is of independent interest, and the counting nature of the problems emphasizes 
their combinatorial properties. To the best of our knowledge, no previous results on C1P have been 
linked to either multisets or the j^-V class. In the known literature, other extensions of the C1P 
have been shown to be W'P-complete using reductions based on Hamiltonian paths (e.g. [Kou77]). 
However, our approach proves a non-weaker property of completeness (since MV C #V) using 
novel reductions. Since the latter ones are parsimonious (i.e. preserve the number of solutions), the 
decisional version of #FRONT and #FMO is ATP-complete since it suffices to test non-emptiness. 

1.3 Related work 

Testing the C1P can also be done using variants of the PQ-tree data structure. Although optimal 
from a theoretical viewpoint, Booth and Leuker's algorithm [BL76] is quite difficult to implement 
since it builds the PQ-tree by induction on the number of rows of the matrix. For each row, it 
performs a second induction from the leaves towards the root, using one of nine templates at each 
node encountered in order to understand how the other nodes must be restructured. 




The PC-tree is an alternative data structure introduced by Shih and Hsu in [SH99] to address 
these difficulties, that can also be used to check the C1P as shown in [HsuOl]. Both the above tree 
structures have remarkably simple definitions applying previously-known theorems on set families 
to this domain. Also, the PC-tree gives a representation of the circular ones orderings of the matrix 
M just as the PQ-tree gives a representation of all the C1P orderings. 

The PQR-tree is another alternative data structure introduced by Meidanis et al. [MPT98] to 
devise a tree also for the case when the input does not satisfy the C1P. In particular, the R-node 
is introduced: it is like the P-node, except that it captures the portion of the frontier that violates 
the C1P. 

As previously mentioned, the C1P has several interesting applications since several apparently 
unrelated problems reduce to it. One of such problem is to decide if a given graph G is an interval 
graph: in [FG65] the authors proved that a graph G is an interval graph if and only if its clique 
matrix has the C1P by rows. 

Another important application is in graph planarity testing: given a graph G return a planar 
embedding for G and if it does not exists return a Kuratowski subgraph isolator [Kur30] . In this 
case, the C1P is used as a step in the Booth and Lueker algorithm [BL76] to check planarity in 
linear time: this approach adds one vertex at a time, updating the PQ-tree to keep track of possible 
embeddings of the subgraph induced by vertices so far. (A much more simpler approach based on 
PC-tree has been developed in [SH99].) 

Recall that not all instances of F and R enjoy the C1P. In that case, either duplication of 
symbols, or "breaking" some set in F into subsets, must be allowed in order to arrange linearly 
the input symbols of R. The former scenario gives rise to the problem of minimizing duplication 
of symbols. The latter gives rise to the problem of minimizing the number of subsets the input 
sets are split into (sometimes referred in literature as the consecutive block minimization problem). 
Both problems, in their decision version, have been proved in [Kou77] to be A/'P-complete (an 1.5 
approximation algorithm for the block minimization problem is described in [HL08]). For example 
the C1P instance where R = {a, b,c,d} and F = {{a, b, c}, {a, c, d}, {b, d}} has no solution. If 
we allow duplication of symbols, two solutions are x = bacdb and y = dbacd (where b and d are 
repeated twice in x and y respectively). If we allow some constraints not being satisfied, an optimal 
solution is z = bacd where only the set {b,d} is broken into two subsets {b} and {d}. 

The question of extending the C1P to multisets introduced in our paper, has not been studied 
before, as far as we know. It has several practical fallouts. For example, in the field of the 
comparative genomics, the symbols correspond to the genes, and the multisets in F corresponds 
to set of genes occurring consecutively in one or more genomes. Genes that appear together 
consistently across genomes, possibly not always in the same order, are believed to be functionally 
related. They often code interacting proteins and have a common functional association [MPN + 99, 
SLBHOO, OFD+99]. 

Given two or more genomes, if each gene occurs in each genome exactly once, the above gene 
clusters can be modeled as common intervals of the permutations as described in [UYOO]. These 
clusters can be detected by the algorithms in [UYOO, HS01] in optimal linear time and space. 

In [LP W05] , the authors reduced the problem of finding the most "interesting" gene clusters to 
the C1P, with respect to the definition of maximality of [ELP03]. They employed the Booth-Luckcr 
algorithm in order to compute the minimal consensus PQ-tree representing such clusters. The 
above paper also discusses how to handle the case where the input genomes are not permutations, 
but strings where some gene can be missing. For the case in which each gene can occur in a 
genome multiple times an exponential time algorithm is presented. Multiple occurrences of the 
same symbols is the way to model paralogous genes inside the same genome and multiplicity is low 
and rare in the observed cases [LPW05]. 



As seen in Section 1, if each leaf of the PQ-tree is labeled by a distinct gene symbol, it is easy 
to count the number of different permutations represented by the PQ-tree. This number has been 
used in [Par07], where the so called P-arrangements are selected among all of these permutations, 
to estimate the occurrence probability of a gene cluster in order to select more interesting ones. 
However in the same paper the problem of counting the number of different strings generated by a 
PQ-tree if some symbol occurs more than once is left as an open issue. 

2 Definitions and Terminology 

We consider a special class of strings defined over multisets, where the usual notions of inclusion, 
equality, and union, take into account the multiplicities of the elements in the multisets. We say 
that a string s = S1S2 • • ■ s n is drawn from a multiset R of symbols if and only if the multiset 
S = {si, S2, • • • , s n } satisfies the condition S C R, where Sj denotes the symbol stored into position 
i of s, for 1 < i < n. 

We also say that a multiset P occurs in a string s (or equivalently P is contained in s), if there 
is a substring SjSj+i ■ ■ ■ s j of s, where 1 < i,j < n, such that P = {sj, Sj+i, . . . , Sj}. 1 In the latter 
case, we say that P occurs at position i in s (and P is called 7r-pattern [AALS03]). For example, 
P = {a, c, a} occurs at position i = 1 in s = aacb, while P is not contained in S2 = aabc. 

We also consider Sperner collections [Eng97] in the next sections. A collection of multisets Q±, 
Q21 ■ ■ ■ , Qm C R is said to be a Sperner Collection (or Sperner Family, or Sperner System) if it is 
an anti-chain in the inclusion lattice over the powerset of R; namely, no multiset Qi is contained in 
any other multiset Qj of the collection (i / j). If no set Qi is contained in the union of the others, 
Uj^iQj, then the Sperner Collection is said to be strict. 

Given a decision problem A, we will denote by #A its counting version, where we are required 
to count the number of the solutions of A [Val79]. We now introduce the #FMO problem, that 
formalizes the problem of extending the Booth-Leuker approach [BL76] for the C1P to multisets. 

Problem 1 (#FMO = Counting Full Multiset Orderings) Input: an instance (R,F), where 
R is a multiset of symbols, and F = {Qi, ■ ■ ■ , Q m } is a family of multisets Qi C R. Output: how 
many strings x can be drawn from all symbols in R (\x\ = \R\), so that each Qi is contained in x? 

For example, given R = {a, b, b, c, d} and F = {{b, c}, {b, d}}, x = abcbd, is one of the feasible 
solution of the {R, F) #FMO instance. We now introduce our second problem, which requires some 
additional terminology and is related to Problem 1. 

Problem 2 (#FRONT= Counting PQ-trees Frontiers) Input: a PQ-tree T , where its leaves 
are labeled with symbols that are not necessarily distinct. Output: what is the size of the frontier 
Fr(T) ofT? 

A PQ-tree is a tree-based data structure introduced in [Boo75, BL76] to represent succinctly a 
set of permutations on a set R of elements, through feasible rearrangements of the children at its 
internal nodes. PQ-trees are useful to solve problems where the goal is to find an ordering of the 
input set of elements satisfying some given constraints, as in the case for the C1P. 

Specifically, a PQ-tree is a rooted tree whose internal nodes are of two types: P-nodes that do 
not define any specific ordering among their children; Q-nodes whose children can appear either in 
left-to-right order or in right-to-left order. Each leaf of a PQ-tree T is labeled with a symbol of 

1 In order to simplify the notation, we will always assume that an index i is well defined, without explicitly writing 
its range when it can be deduced from the context. For example, a nonempty substring SiS i+ i ■ ■ ■ Sj has f < i < j < n. 



the input alphabet R, and the frontier of T, denoted by F(T), is the permutation of the symbols 
obtained by reading the labels of the leaves from left to right. 

Given two PQ-trees T and T', we say that T is equivalent to T' (written T = T') if one tree 
can be obtained from the other by possibly permuting the children of one or more P-nodes, and by 
possibly reversing the children of some Q-nodes. The set of the frontiers of all the trees that are 
equivalent to T is denoted by Fr(T). 

Since a P-node (or a Q-node) having one child can be removed from T without changing Fr(T), 
and a P-node with two children can be replaced by a Q-node (it represents the "left to right" and 
"right to left" permutations only), we define the canonical form constraining each Q-node to have 
at least two children, and each P-node to have at least three children. In the rest of the paper, we 
assume that each PQ-tree is in canonical form. 

We are interested in counting the number of frontiers in Problem 2, namely, the size of Fr(T) 
for a PQ-tree T. A formal description of the #"P class is beyond the scope of this paper, and we 
refer the interested reader to the textbooks in [AB09, GJ79, Pap94]. However, we are going to use 
the notion of #"P-completeness to address the difficulty of our combinatorial problems, and so we 
recall some basic definitions. 

Let / be a integral function defined over strings in £*, for a given alphabet £. We say that 
/ G #V if there exists a binary relation T(— , — ) such that: 

• If (y, x) G T, the length of solution x is polynomial in the length of input y. 

• It can be verified in polynomial time that a pair (y, x) belongs to T. 

• For every input y G £*, f(y) = \{x : (y,x) G T}\ is the number of solutions for y. 

Given two integral functions /, g defined over £*, we say that there exists a polynomial Turing 
reduction from g to / if the function g can be computed in polynomial time by using a (polynomial) 
number of calls to an oracle for /. The reduction is parsimonious if it preserves the number of 
solutions. 2 A function / is j^V-hard if for every g G #V there is a polynomial reduction from g to 
/. As usual, a function is jf-V -complete if it is both #"P-hard and it is in jf-V . 

3 Counting the frontiers of a PQ-tree 

We begin by discussing the completeness of the #FRONT problem. We use a reduction from the 
well-known counting version of Hamiltonian Path (#HAM). We are given an undirected graph G, a 
source vertex w G G, and a destination vertex s G G. We want to know how many paths H in G 
start in w and end in s, such that all the vertices in G are traversed exactly once by each H. For 
example, one such path is H = (1,3,2,4,5) in the graph G shown in Figure 3. In the rest of the 
paper, we assume that G is connected, w and s have degree at least one, and the other vertices 
have degree at least two (otherwise there is no Hamiltonian path). We also assume that there are 
no multiple edges between the same pair of vertices and no self-loops. 

3.1 Construction of the PQ-trees 

The main idea is to code the structure of the given graph G in three suitable PQ-trees, Tq, Ty, 
and Te, such that each Hamiltonian path H is in one-to-many correspondence with a suitable set 
of strings from their frontiers. We now describe our reduction from G = (V, E) to Tq, Ty, and Te, 
using Figure 3 as an illustrative example. 

2 Hence it allows for non-emptiness testing in the decisional version of the problems. 




Figure 3: The PQ-tree Tc associated with the input graph G where the source and the destination 
vertices are w = 1 and s = 5. Note that Tq has Ty and Te are shown individually. 

The root of Tq is a Q-node having two PQ-trees Ty and Te as children. 

Tree Te encodes all the feasible permutations of the edges in E. The root of Te is a P-node 
having \E\ + 2 children. Two of them are special "endmarkers," and are labeled with $ and Each 
of the remaining children is a Q-node that encodes an edge e = {i,j} by two leaves labeled with i 
and j, respectively, as children. In our example, Te has \E\ = 7 Q-nodes with children labeled by 
{1, 2}, {1, 3}, {1,4}, {2, 3}, {2, 4}, {3, 4}, and {4, 5}, plus the endmarkers $ and #. 

Tree Ty enforces a classification of the edges as "coding" a Hamiltonian path, or "non-coding" 
otherwise. Specifically, the root of Ty is a Q-node with four children: one leaf labeled with $, a 
PQ-tree Tc for the coding edges, one more leaf labeled with #, and a PQ-tree T/v for the non- 
coding edges. The root of Tc is a Q-node with three children. The first child is a leaf labeled with 
the source w and the last is a leaf labeled with the destination s. The middle child is a P-node 
with |V| — 2 children, each of which is a Q-node with two leaves labeled with the same symbol i, 
for i £ V \ {w,s}. In our example w = 1, s = 5, and |V| = 5. The root of the non-coding tree 
T/v is a P-node having 2(\E\ — \V\ + 1) leaves as children. Letting di denote the degree of vertex i, 
there are d w — 1 leaves labeled with w, d s — 1 leaves labeled with s, and di — 2 leaves labeled with 
i^w,s. In our example, the leaves are labeled with 1,1,2,3,4,4, where 2(\E\ — \V\ + 1) = 6. 

The above construction requires polynomial time, and the rationale will be given in Section 3.2. 

Lemma 1 Given a undirected graph G = (V,E), its corresponding PQ-trees Tc, Ty, and Te can 
be built in 0(\V\ + \E\) time. 

3.2 Properties of the PQ-trees 

Consider the Hamiltonian path H = (1,3,2,4,5) in our example. (Observe that the reversal of 
H, namely (5,4,2,3,1), is also a Hamiltonian path, but we consider it to be different from H 
for the counting purposes.) The corresponding strings a H belonging to the frontiers Fr(Tc) are 
thus characterized. First at all, each a H is a square, namely, the concatenation a H = aa of 
two equal strings a, where a belongs to both the frontiers Fr(Ty) and Ft^Te)-, and is of length 
2\E\ + 2. For example, a = $13322445^121434 is one such feasible string. We can characterize 
the general structure of the strings a by observing that they matches one of the following two 



patterns. Let tt denote an arbitrarily chosen permutation of the pairs in {1, 2}, {1, 4}, {3, 4}, which 
represent the edges not traversed by H. (That is, tt belongs to the frontiers of the PQ-tree re- 
sulting from {{1, 2}, {1,4}, {3,4}}.) The former pattern for a is $ 13322445 #tt, where the initial 
symbols are fixed and only tt may vary; analogously, the latter is tt # 13322445 $. For example, 
a = 413421 # 13322445 $ matches the latter pattern. 

Having introduced the structure of a H = a a in our example, we show how to make a satisfy 
the implicit conditions encoded in Ty and Te- Indeed, Te guarantees that the two integers in each 
of the pairs corresponding to the edges in E always occur consecutively in a. Moreover, the subtree 
Tc in Ty constraints each vertex i G V \ {w, s} to appear exactly twice in the chosen subset of 
edges, while w and s are required to appear just once. Note that the purpose of the subtree Tn is 
that of "padding" the edges in E that are not traversed by H, since we do not know a priori which 
ones will be touched by H. 

We now generalize the above observations on a. In the following we can restrict our focus on 
paths of the form i±, 12, ■ ■ ■ , i\y\, that are permutations of {1, 2, . . . , \V\} with i± = w and i\yi = s 
(otherwise they cannot be Hamiltonian paths from w to s). Moreover we introduce the notation 
Perm(Q) for a set Q = {{cti, b±}, {02, 62}, . . . ,{a r ,b r }} of unordered pairs. It represents the set 
of all the permutations of a\, b\, a 2 , b 2 , • • • , o, r , b r such that ai and bi occupy contiguous positions 
for 1 < I < r. For example, given Q = {{1, 2}, {1, 4}, {3, 4}}, we have that 413421 is a valid 
permutation in Perm(Q), while 413241 is not. 

We now show in Lemmas 2-4 that there exists a one-to-many correspondence between the 
Hamiltonian path H in G and the strings a G Fr(Ty) n Ft(Te)- 

Lemma 2 Let G = (V,E) be an undirected graph, and Tq, Ty, and Te be its corresponding PQ- 
trees. For any string a G Fr(Ty) n Pt(Te), there exists a corresponding Hamiltonian path H of G 
from w to s. 

Proof: Consider a string a G Fr(Ty) n Fv{Te). We first show that the symbols in a follow a special 
pattern. 

Since a G Fr(Ty ), the symbols $ and # in it match those in the leaves of Ty by construction. 
Assume w.l.o.g. that the first symbol of a is $. (The other case in which $ is the last symbol of a 
is analogous.) Then, a is of the form a = $ r # tt by construction, where r = T\T2 ■ ■ ■ T2| v| — 2 an( i 
tt should follow the patterns described next. First, r = wt's where r' G Perm({i, i}i^ w , s ), since 
t' G Fr(Tc): hence, Tj = Tj+i for even values of i G [2 . . . 2\V\ — 4]. Second, tt is a permutation of 
the symbols in the multiset obtained by removing the symbols of r from U{i 

Now, the fact that a belongs also to Ft(Te) puts additional constraints on r and tt. Indeed, 
the Q-nodes in Te guarantee that t\ and T2 are children of the same Q-node, T3 and T4 are children 
of the next Q-node, and so on. Thus in general Tj,Tj + 1 for odd i belong to the same Q-node: 
hence, {r^Tj + 1} G E, for even values of i G [2 . . . 2\V\ — 4]. Combining the latter with the fact 
that Tj = Tj+i for odd values of i, we obtain that H = (w, T2, . . . , T 2 \y\-4, s) is a Hamiltonian path. 

Note that the rest of the Q-nodes in Te induce also some contiguity constraints on tt, which will 
be relevant later for the counting argument (see Lemma 5). The case a = tt # r $ is analogous. □ 

Lemma 3 Let G = (V,E) be an undirected graph, and Tq, Ty, and Te be its corresponding PQ- 
trees. For any Hamiltonian path H of G from w to s, there exists at least one corresponding string 
a G Fr(T v ) n Fr(T E ). 

Proof: Let H = {11,12, ■ ■ ■ ,i\v\) be a Hamiltonian path, where i\ = w and i\y\ = s. We define 
a = $r#7r where r and tt are as follows. First, we choose r = i\iiii • • ■ i\y\-ii\y\-ii\y\, so that 



r G Fr(T G ). Second, let E' = E \ {{ij, ij+i}}i<j<\v\-i be the set of edges not traversed by H. Let 
list the edges of E' as {a\, b\}, . . . , {a r , b r }. Then we choose ir = a\b\ ■ ■ ■ a r b r , so that ir G Fr(T/v). 

Consequently, a should belong to Fr{Ty). It remains to see that a belongs also to Ft{Te)- 
Note that the $ and # symbols in a clearly match the two endmarker leaves in Te- Also, by 
our construction of r and it, for any edge {i,j} in E, we have that i and j appear in consecutive 
positions of either r or ir. This concludes the proof implying that a G Fr(Ty) n Fr(TE). □ 

Lemma 4 Let C Fr(Ty) n Ft(Te) denote the set of all the strings corresponding to a given 
Hamiltonian path H , as stated in Lemma 3. Then, for any two Hamiltonian paths H 7^ H' of G 
from vertex w to vertex s, it is n = 0. 

Proof: For any a G and a' G Sp, we show that a / a'. If one of the strings begins with the $ 
symbol, while the other does not, they are different since neither r or it contains any endmarker (e.g. 
a = $r#7T is different from a' = 7t'#t'$). Hence, consider the case when both a and a' begin 
with $. Since the corresponding Hamiltonian paths H and H' are different, also the corresponding 
"coding" strings r and r' will be different by construction, implying that a / a'. □ 



3.3 Reduction from #HAM to #FRONT 

We now show how to reduce the problem #HAM of counting the Hamiltonian paths in G = (V, E), 
to the problem #FRONT of counting the frontiers of PQ-trees, namely, Tq, Ty, and Te- We denote 
the number of frontiers for a PQ-tree T by \Fr(T)\. Here is the polynomial time reduction for the 
input graph G and its two vertices w and s: 

• Build the PQ-trees Tq, Ty, and Te (see Lemma 1). 

• Return the following integer as the number of Hamiltonian paths from w to s in G: 

2\Fr(T v )\ x \Fr{T E )\-\Fr(T G )\ 
2{\E\ - \V\ + l)\ x 2l^H y l+ 1 1 ' 

Clearly, the formula in (1) can be computed in polynomial time. We now show its correctness. 

Lemma 5 Let 'Eh C Fr(Ty)nFr(TE) denote the set of strings corresponding to a Hamiltonian path 
H . Then, for any Hamiltonian path H from w to s, we have = 2 (\E\ — \V\ + 1)! x 2\ E \~\ V \ +1 . 

Proof: Consider a string a G As previously mentioned in the proof of Lemma 2, a matches 
either the pattern $r#7ror7r#r$. Note that the string r is uniquely determined by construction 
of Tq, and the contiguity condition imposed by Te, for the given H. Hence, is twice the 

number of strings it that we can obtain from T/v, under the contiguity condition imposed by Te- 
Therefore, = 2 \Perm{E')\, where E' C E is the set of edges not traversed by H. Since \E'\ is 
p=\E\ — |V| + 1, we have p\ permutations of these edges and, for each of them, we have two ways 
to permute every {i,j} G E' . This gives a total of p\2 p strings ir. Note that we cannot generate 
twice the same string in this way, because the edges are distinct as unordered pairs and, for each 
pair {i, j} G E', it is i 7^ j. Hence the result follows. □ 

Lemma 6 \Fr(T G )\ = 2 \Fr{T v )\ x \Fr{T E )\ - \Fr{T v ) n Fr(T E )\ 

Proof: Let Ly = Fr(Ty), L E = Fr(T E ), and L G = Fr{T G ). Consider L VE = Fr(T v ) n Fr{T E ), 
so that we can rewrite Ly = L' v U LyE and Le = L' E U LyE- Now, by construction of T G , we 
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Figure 4: Example of reduction of a Hamiltonian Path instance where the source and the destination 
vertices are w = 1 and s = 2 into a #FMO instance (R, F). Sets Qij are shown boxed in string x. 



know that Lq = Ly ■ Le U Le • Ly, where the standard operation "•" denotes the extension of 
the string concatenation to sets of strings (i.e. A ■ B = {ab \ a € A, b € B}). By expanding 
Ly and Le, we obtain that Lq = (L' v U Ly£) ■ (L' E U Lve) U (L^ U Ly_e) ■ (Ly U Lye)- By 
simple algebra, we have that \Lq\ = |Ly • + \Le ■ Ly\ — \Le H The result follows, since 

|Ly • L B | = |L_B • Ly| = |L B | X \Ly\. □ 
We now have all the ingredients to prove the ^P-completeness of the #FRONT problem. 



Theorem 7 # FRONT is #V -complete. 



Proof: The membership to j^V trivially holds. In order to prove that the formula in (1) is 
correct, observe that the sets Ejj for all the Hamiltonian paths H from w to s, are a partition of 
I = Fr(Ty) n Ft(Te)- To see why, note that for each string in I, there is a Hamiltonian path by 
Lemma 2. Moreover, E# C I by Lemma 3. Finally, the sets T,h are pairwise disjoint by Lemma 4. 

Formula (1) is based on the fact that |/| can be obtained from \Tq\, \Ty\, and \Te\ by using 
Lemma 6. Moreover, sets have all the same size, as stated in Lemma 5. Hence, dividing these 
two quantities gives an integer as a result, which is the number of Hamiltonian paths as in (1). 
Note that our reduction requires polynomial time. □ 



4 Hardness results for #fmo 

We now show how to reduce the #HAM problem to the counting version of the Full Multiset Problem 
(#FMO). For the given undirected graph G = (V,E), together with the source and the destination 
vertices, w and s, we make the same assumptions as in Section 3. In Section 4.1, we walk through 
the example in Figure 4 to describe the reduction. In Section 4.2, we characterize the structure of 
each string satisfying the constraints in the #FMO instance. In Section 4.3, we prove our hardness 
result on counting how many strings correspond to the same Hamiltonian path H in G. 

4.1 Instance construction 

Consider the example in Figure 4. On the left we show the input undirected graph G, where the 
source and the destination vertices w = 1 and s = 2 are in boldface. The corresponding #FMO 
instance (R,F) is reported on the right, while one of the solution string x, corresponding to the 
Hamiltonian path H = (1,3,4,2) is represented at the bottom. 



We build an instance of #FMO as follows. For each vertex i, we construct the multiset Qi 
containing two occurrences of the symbol i (if i ^ w,s), or one occurrence of i and one of the 
special symbol a (if i = w,s). We also add symbols dij and j to Qi, for every incident edge 
{i,j}- As a result, each undirected edge {i,j} is represented by two different symbols £ Qi and 
dji G Qj. Formally, 




\J{i,j}eE{ d H->3}U{i,Ci} i = w,s 
U{i,j}e£Kj,j}U{M}, i + w,s 



To guarantee the condition that w and s are the source and the destination vertex, we introduce 
two symbols c' w and c' s , and two sets R w = {c w , c' w } and R s = {c s , c' s }, which do not correspond to 
any vertex of the input graph. They are used to guarantee that Q w and Q s will always occur as 
the first and the last multiset of any solution string x for our #FMO instance. 

In general, the intersection between two multisets Qi and Qj can contain more symbols than 
just i and j. For example, the intersection between Q\ and Q4 is /14 = {1,4,2,3} because it 
contains also 2 and 3, each of them corresponding to the vertex forming a triangle with 1 and 4. To 
avoid this situation, 2 \E\ auxiliary multisets Qij = {dij,j} are used to constraint the intersection 
between the multisets inside each solution string x, such that it contains exactly two symbols. 
Observe that each edge {i,j} £ E gives rise to two multisets Qij and Qji. In the string x shown 
in Figure 4, the purpose of the multisets Qij and Qji is to enforce the intersection between Qi and 
Qs inside x to be {1,3}, between Q3 and Q4 to be {3,4}, and so on. 

We finally choose the multiset R = Q \ R' where Q = \J ri Qi U {c' w ,c' s } and R' = Ui^w s 0> U 
{w, s}. We also choose F = {Qi, ■ ■ ■ , Q\v\} U {Rw, Rs} U {Qij, Qji}{i,j}eE- The idea behind the 
construction of R and F is illustrated in our example. Each Hamiltonian path H from w = 1 to 
s = 2 contains only one edge incident to w ({1, 3} in our example), one edge incident to s ({2,4}), 
and two edges incident to each of the other vertices in H ({1,3} and {3,4} incident to 3, and 
{3,4} and {2,4} incident to 4). The path H can always be represented by a string x having size 
\R\. The multisets Qi occur inside x in the same order as that of the vertices i inside H. The 
intersection between consecutive Qi and Qj is now guaranteed to contain just i,j in consecutive 
positions of x. For example, Q\, Q3, Q4, and Q2 correspond to the vertices in H = (1, 3, 4, 2), while 
their intersections correspond to the edges used in H. Here is the role of R': since we do not know a 
priori which edges will be traversed by H, we can rely just on the multiset given by their endpoints, 
thus giving rise to R' . Even if we have to remove R' from Q to obtain R, we still guarantee that 
(R, F) is a valid #FMO instance. 

Lemma 8 Each multiset M € F is contained in R. 

Proof: We recall that F = {Q\, . . . , Q\v\} U {Rw, Rs} U {Qij, Qji}{i,j}eE, an d that R = Q \ R' 
where Q = [j i Qi U {c' w ,c' s } and R' = {Ji^ w s {i,i} U {w,s}. Since we assumed that the degree of 
w is at least one, w has at least one incident edge {w,j}. By construction of the Qi multisets, it 
follows that the symbol w has at least two occurrences in Q: one occurrence belongs to Q w , while 
the second occurrence belongs to the multiset Qj associated to the vertex j. Same as above for 
the destination vertex s, that occurs at least two times in Q. Since we assumed each one of the 
remaining vertex i 7^ w,s, to have at least two neighbors in G, (let say j,l,) it follows that the 
symbol i has at least four occurrences in Q: two occurrences belong to Qi, the third occurrence 
belongs to Qj, while the fourth one belongs to Qi. 

From the above, it follows that R = Q\R' contains at least one occurrence of w, one occurrence 
of s, and two occurrence of each i ^ w,s. 

At this point, we have all the ingredients to prove that Q w C R. The multiset Q w contains 
exactly one occurrence of w, and at most one occurrence for every other symbol i 7^ w. Moreover, 
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Figure 5: The string x coding the Hamiltonian path H = ... ,i n ) of G. Intersections between 
Qi and Qj have size 2 in x and are constrained to be {i, j}. 



for each dij € Q w , it holds that dij € -R, since i? C Q, and no one is in i?'. Also the symbol 
c w is contained in R, since € Q w , but c w i?'. Same as above for Q s , and the remaining Qi 
multisets, with i 7^ w,s. In the case of the Qi multisets, the symbol i occurs two times inside each 
Qi, but this is not an issue since, as discussed above, R contains at least two occurrence of each 
symbols i 7^ w,s. 

To prove that each Qij = {dij,j} and Qij = {dji,i} is contained in R, it is enough to note that 
dij, dji € Q, but dij, dji g" R', and that for every symbol i or j there is at least one occurrence in R. 

Finally, we observe that R w = {c w , c' w } and R s = {c s , c' s } are contained in R, since the symbols 
c w -,c s , c' w ,c' s are in Q, but they are not in R'. □ 

Lemma 9 Given a undirected graph G = (V,E), together with a source and a destination vertex, 
w and s, the corresponding instance (R,F) of #FMO, can be built in 0(\V\ + \E\) time. 

4.2 Characterization of the solutions 

We need some technical lemmas, as in Section 3.2. In particular, Lemmas 10-12 follow the same 
path as that traced in Lemmas 2-4 for #FRONT. 

Lemma 10 Let G = {V, E) be an undirected graph, and (R, F) be its corresponding #FMO instance. 
For any string x that is solution of {R,F), there exists a corresponding Hamiltonian path H of G 
from w to s. 

Lemma 11 Let G = {V, E) be an undirected graph, and (R, F) be its corresponding #FMO instance. 
For any Hamiltonian path H of G from w to s, there exists at least one corresponding solution x 
of(R,F). 

Lemma 12 Let T*h denote the set of all the solutions of (R, F) corresponding to a given Hamil- 
tonian path H , as stated in Lemma 11. Then, for any two Hamiltonian paths H 7^ H' of G from 
vertex w to vertex s, it is PI T, H > = 0. 

We now prove Lemma 10, leaving the proof of Lemmas 11-12 at the end of the section. We 
consider a solution x of (R,F), and make three conceptual steps. 

(a) We prove that the multisets Qi follow a total order -< x induced by x. 

(b) We show that each Qi occurs exactly once in x. 

(c) For any two consecutive Qi and Qj in the total order -< x , we demonstrate that their inter- 
section in x corresponds to edge {i,j} € E. 



Observe that steps (a) and (6) encode all possible permutations of the vertices in V, while 
step (c) selects only those permutations (if any) that correspond to paths in G. Putting (o)-(c) 



Figure 6: The four possible cases if Iij = 3. From left to right, top-down, the case where Iij = 
{i,j,h}, Iij = {h,h,h}, Iij = {i,h,h}, or = {j, /^Al- 
together, we can see that the Hamiltonian path corresponding to x is H = (21,22 • • • , iiy|)> where 
Qh -<x Qi 2 ~<x • • • -<x Qi\ V \ 1S the total order induced by x. 

We show a slightly more general property than that stated in (a), using the following lemma. 

Lemma 13 (Strict Sperner Property) The collection of multisets C = {R w , R s , Q±, . . . , Qiyi}, 
is a Strict Sperner collection: no multiset is contained in the union of the others. Hence, there 
exists a total order -< x on the multisets in C . 

Proof: First at all we observe that each Qi E C contains at least one symbol that is unique in x 
and does not belong to any other multiset. Hence, Qi cannot be contained in the union of the other 
multisets. Also the multisets R w and R s contain unique symbols, namely, c' w and c' s . Hence, C is 
a strict Sperner collection: this property, combined with the fact that each multiset in C occurs in 
x, implies that a left-to-right scan of x provides a total order of the multisets in C. That is, for 
any pair Qi and Qj either Qi -< x Qj or Qj -< x Qi- □ 

We prove the property stated in step (c) by the following lemma. 

Lemma 14 (Intersection Size) Let x be a string of size \R\, drawn from all the symbols in R, 
and containing all the multisets in C2 = {R w , RsiQii Qij}- I> e t hj = Qi^Qj denote the intersection 
between two multisets Qi and Qj that occur consecutively in x. Then, (i) = 2; (ii) 1^ = {i,j}; 
(Hi) {i,j} G E. 

Proof: (i) First, let l±, I2, ■ ■ ■ denote some generic vertices that are adjacent to both i and j. By 
construction of the multisets Qi, note that 1^ can only contain the symbols i, j or l p for p = 1,2, .... 
Formally: 

o«ng i = / l { ;' i}uU ^^ {W ^ }eE (2) 

I VWpUlejyeEW otherwise 

Assume that = 3. Then, four cases are possible when considering the sets Qf g where 
{i,j,h,h, • ••} and / ^ g: 



1. Iij = {i,j,h} 



2. Iij = {i,h,h} 
3- Iij = {j,h,h} 
4. I i: j = {h,l 2 ,h} 

We discuss case 1 (since cases 2-4 are similar), which is represented on the top left of Figure 6. 
Here, it is shown that the symbols in the four multiset Qij, Qji, Qu x and Qji x , corresponding to 
the three edges {i,j}, {i,h} and {j,h}, cannot occur inside Qi or Qj, because each symbol dij 
only belongs to Qi (hence it cannot be a member of the intersection 1^), and we only have one 
occurrence of l\ inside Qi and one occurrence inside Qj. 

The cases where the intersection has size larger than 3 are similar. In these cases we can always 
select from Iij a subset of three symbols, reducing to one of the above cases: if |Jy| > 3, we can 
apply the above argument to i, j and an arbitrary vertex in 1^ \ {i,j}. 

Given the above upper bound on the size of an intersection, we now prove that \Iij\ cannot be 
smaller than 2. By Lemma 13 we know that each multiset Qi cannot be contained in the union of 
the other multisets, hence in order to construct a string x of size \R\ containing all the multisets in 
C2, the combined size of the intersections between the Qi multisets must be 2(|V| — 1). Assuming 
that at least one of such intersections has size 1, then some other intersection would have size 3, 
contradicting the previous upper bound. From the previous upper and lower bounds it follows that 
each intersection must have size \hj \ = 2. 

(ii) To prove that Jy = {i,j}, let us assume by contradiction that ijj = {i, I}, where / 7^ j is a 
vertex forming a triangle in the input graph G together with i and j. As in point (i), it is easy to 
prove that the two sets Qu = {du, I}, Qji = {dji, 1} cannot occur inside the solution string x, since 
Qj and Qi only contain one occurrence of the symbol I each. The dji symbol cannot be contained 
in the intersection 1^ since only the symbols i and / are inside. 

The proofs for the other cases Iy = {j,h} and Iy = {h,fa} are identical to this one. 
(Hi) The conclusion follows from the point (ii) and from the intersection property highlighted 
in Equation (2), stating that if {i,j} Q Iij, then {i,j} EE. □ 

Finally, the property stated in step (b) is based on the lemma below. 

Lemma 15 (Occurrence Uniqueness) Given a solution x of (R,F), each multiset Qi G F oc- 
curs exactly once inside x. 

Proof: We recall that each Qi occurs at least once inside x since the latter is a valid solution. 
Suppose by contradiction that there exists a multiset Qi* that occurs twice or more inside x. 

First, we show that all the occurrences of Qi* form a run, that is, any two such occurrences 
must overlap and there is no occurrence of Qk (k ^ i*) between them. This is easy to see, since 
each di*j occurs only once in x. 

Second, consider all the runs in x, where a multiset occurring once is seen as a degenerate run. 
If two runs intersect, their intersection contains exactly two symbols by Lemma 14. 

Third, the run of Qi* must be degenerate, thus contradicting the hypothesis that there are 
at least two occurrences. Indeed, if the run of Qi* is not degenerate, then |x| > \R\, which is 
not possible. To see why, we recall that a valid solution x of (R,F) is required to have size 
\x\ = \R\ = 4\E\ + 4. Since q = | |J i Qi\ = 4\E\ + 2\V\, some overlaps between consecutive runs are 
required. As previously mentioned, the intersection of two consecutive runs contains two elements. 
Hence, r = 2\V\ — 2 is the number of symbols in the overlaps between pairs of consecutive runs 
in x. In order to fit the required length \R\, the first run must also intersect R w in c w , while the 



last one must intersect R s in c s . We also should add to these q elements, the two special symbols 
c' w and c' s , totalizing |x| = \R\ = (q + 2) — r elements in x (and so many in R as well). If the run 
of Qi* is non-degenerate, then its size will be at least \Qi*\ + 1, implying that there are at least 
(g + 2 + 1) — r > \R\ symbols in x. Consequently, \x\ would be strictly larger than \R\, contradicting 
the validity of x as solution of (R, F). □ 

It remains to prove Lemma 11 and Lemma 12. 

Let us discuss Lemma 11. Given a Hamiltonian path H = (ii,i2, ■ ■ ■ ,i\v\) °f G, where i\ = w 
and i\y\ = s, in order to construct a solution x of the corresponding #FMO instance (R,F), we 
arrange the multisets Qi in the same order as the corresponding vertices in H , as shown in Figure 5. 
The first symbol of x is c' w and the last one is c' s . Between them, Q^^Q^, ■ ■ ■ ,Qi, v , appears in x, 
where the first symbol of is c w , and the last symbol is i±, and the first symbol of Qi, v , is i\y\ 
and the last symbol is c s . For the remaining the first three symbols are and di l i l _ 1 , and 

the first two of them overlap with Qi l _ 1 by Lemma 14. Analogously, the last three symbols are 
d%iii +1 i k+i and ij, and the last two of them overlap with Qi l+1 - The remaining symbols in Qi t are 
di t j,j for all edges {ii,j} G E, such that j / 

Each multiset Qi t intersects Qi l+1 in € E. Note that, since H is a Hamiltonian path, 

the symbols belonging to the union of all the intersections are R' = [j^ w s {i, i] U {w, s}. 

To prove that x is a solution of (R,F), note that x contains each multiset Qi, R w , R s by con- 
struction. As for each Qij = {dij,j}, we observe that its occurrence is contained in the occurrence 
of Qi in x. Moreover, x contains the multiset R and x has size \R\, since x is drawn from the 
multiset [j i Qi U {c' w , c' s } \ R', that is exactly the way R is defined in (R, F). The above discussion 
prove Lemma 11. 

To prove Lemma 12, consider a string x € X//, and x' G S/// where H' — (z^, i^, . . . , 
Since H / H', they must differ in at least one position I (i.e. i\ / Let assume w.l.o.g. that 
\Qi t \ < IQijI; an d select the position k of the leftmost symbol d^j £ Qi t occurring in x for some j. 
Since the order of the multisets in x is the same as that of the vertices in the Hamiltonian paths, 
Q'h 7^ Qi[ (since i\ ^ By construction of the multisets, we have dij Q^, then the kth symbol 
in x and x' differs, thus proving the claim. 

4.3 Reduction from #HAM to #FMO 

The #FMO problem is clearly in #V, since we can take a solution string x as a certificate. Therefore, 
we focus on its completeness. 

We are given an undirected graph G = (V,E), along with its source w and its destination s. 
The reduction goes as follows. 

• Build an instance (R,F) as described in Section 4.1. 

• Let z be the number of solutions for the instance (R, F). 

• Let a = ni=i a i 7^ 0; where is defined as follows for a vertex i of degree df. 

j 2W- 1 ) (di - 1)! i = w,s 
ai ~\ 2(*- 2 ) - 2)! i^w,s 

• Return the integer z/a. 

The above reduction takes polynomial time. To see its correctness, it suffices to show that 
I = a for every Hamiltonian path H = 12, . . . , i\y\) in G. 



We already proved in Section 4.2 that each solution x € has the form reported in Figure 5. 
Here, the occurrence of each Q, t is a sequence of pairs Qij = {dij,j} except the first and the last 
symbol of Q{. If i ^ w, s, the first and the last pairs always stay the same, while the remaining d{ — 2 
pairs can be permuted in (di — 2)! ways. For each such a way, we can permute each pair internally, 
thus giving an extra factor of 2 dz ~ 2 . If i = w, s, we have di — 1 pairs that can be permuted, yielding 
2(c?i— i) _ iy permutations. 

Theorem 16 #FMO is #V '-complete. 

Corollary 17 Testing the C1P on multisets is J\fV -complete. 



5 Conclusions 

In this paper, we have shown that counting the number of orderings related to the C1P on multisets is 
^P-complete. Hence, a polynomial-time algorithm is unlikely to exist, contrarily to what happens 
for sets. Although a direct mapping of the orderings for the C1P in multisets into the frontiers of 
PQ-trees has some intrinsic ambiguity, we proved that there exists an indirect mapping between the 
two counting problems. It would be interesting to find a direct and "natural" reduction between 
the two problems, without using the counting version of the Hamiltonian path as an intermediate 
problem (see Figure 2). 
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